What Does It Take to Develop a Million Lines of Open Source Code?

نویسندگان

  • Juan Fernández-Ramil
  • Daniel Izquierdo-Cortazar
  • Tom Mens
چکیده

This article presents a preliminary and exploratory study of the relationship between size, on the one hand, and effort, duration and team size, on the other, for 11 Free/Libre/Open Source Software (FLOSS) projects with current size ranging between between 0.6 and 5.3 million lines of code (MLOC). Effort was operationalised based on the number of active committers per month. The extracted data did not fit well an early version of the closed-source cost estimation model COCOMO for proprietary software, overall suggesting that, at least to some extent, FLOSS communities are more productive than closedsource teams. This also motivated the need for FLOSS-specific effort models. As a first approximation, we evaluated 16 linear regression models involving different pairs of attributes. One of our experiments was to calculate the net size, that is, to remove any suspiciously large outliers or jumps in the growth trends. The best model we found involved effort against net size, accounting for 79 percent of the variance. This model was based on data excluding a possible outlier (Eclipse), the largest project in our sample. This suggests that different effort models may be needed for certain categories of FLOSS projects. Incidentally, for each of the 11 individual FLOSS projects we were able to model the net size trends with very high accuracy (R2 ≥ 0.98). Of the 11 projects, 3 have grown superlinearly, 5 linearly and 3 sublinearly, suggesting that in the majority of the cases accumulated complexity is either well controlled or don’t constitute a growth constraining factor.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Internet-Scale Software Repositories

Large repositories of source code create new challenges and opportunities for statistical machine learning. Here we first develop an infrastructure for the automated crawling, parsing, and database storage of open source software. The infrastructure allows us to gather Internet-scale source code. For instance, in one experiment, we gather 4,632 java projects from SourceForge and Apache totaling...

متن کامل

A needle in the stack: efficient clone detection for huge collections of source code

One of the important uses of source code clone detection analysis is plagiarism detection, where a file is compared against a known corpus of source code to try to find potential matches. As the availability of Free and Open Source Software (FOSS) continues to increase it has become important to know if specific source code has been created from copies of FOSS software. Version 5.0.2 of Debian ...

متن کامل

Mistaking the Map for the Territory: What Society Does With Medicine; Comment on “Medicalisation and Overdiagnosis: What Society Does to Medicine”

Van Dijk et al describe how society’s influence on medicine drives both medicalisation and overdiagnosis, and allege that a major political and ethical concern regarding our increasingly interpreting the world through a biomedical lens is that it serves to individualise and depoliticize social problems. I argue that for medicalisation to serve this purpose, it would have to exclude the possibil...

متن کامل

Mining Internet-Scale Software Repositories

Large repositories of source code create new challenges and opportunities for statistical machine learning. Here we first develop Sourcerer, an infrastructure for the automated crawling, parsing, and database storage of open source software. Sourcerer allows us to gather Internet-scale source code. For instance, in one experiment, we gather 4,632 java projects from SourceForge and Apache totali...

متن کامل

Sourcerer: An infrastructure for large-scale collection and analysis of open-source code

A large amount of open source code is now available online, presenting a great potential resource for software developers. This has motivated software engineering researchers to develop tools and techniques to allow developers to reap the benefits of these billions of lines of source code available online. However, collecting and analyzing such a large quantity of source code presents a number ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009